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1 Introduction 



The theory of Algorithmic Differentiation (AD) is concerned with the automated generation of efficient algo- 
rithms for derivative computation of computational models. A computational model (CM) is the description 
of a mathematically expressed (scientific) problem as a computer program. That means that the CM is a 
composite function of elementary functions. From a mathematical point of view, only the operations *,+ 
together with their inverse operations /, — are elementary functions since they are required to define the field 
of real numbers R. However, there are good reasons to include other functions, e.g. those defined in the 
C-header math.h. The reason is firstly because algorithmic implementations of functions as exp,sin may 
contain non-differentiable computations and branches, furthermore one can use the additional structure to 
derive more efficient algorithms. E.g. to compute the univariate Taylor propagation of sin(£^ : Z 1 Xdf d ) can 
be done in ff(D 2 ) arithmetic operations by using the structural information that they are solutions of special 
ordinary differential equations (c.f. (61). Even of higher practical importance is the fact that deriving explicit 
formulas for functions as the trigonometric functions reduces the memory requirement of the reverse mode of 
AD to 0(D) since the intermediate steps do not have to be stored. That means, explicitly deriving derivative 
formulas for functions can yield much better performance and smaller memory requirements. In scientific 
computing, there are many functions that exhibit rich structural information. Among those, the linear algebra 
functions as they are for example implemented in LAPACK are of central importance. This motivates the 
authors' efforts to treat linear algebra routines as elementary functions. 



2 Related Work 



Computing derivatives in the forward mode of AD can be done by propagating polynomial factor rings 
through a program's computational graph. In the past, choices have been univariate Taylor polynomials 
of the form [x]o = Y,d=o x d tD > where x c i G R as described in the standard book "Evaluating Derivatives: 
Principles and Techniques of Algorithmic Differentiation" by Griewank j6]| and implemented e.g. in the 
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3 MATHEMATICAL DESCRIPTION 



AD tool ADOL-C (H. Also, multivariate Taylor polynomials of the form [x]d — LliKD-i-*?'^ where i is a 
multi-index and xi G R, have been successfully used, e.g. for high-order polynomial approximations JUG]- 
Univariate Taylor polynomials over matrices have also been considered, i.e. [A]o = Y!d=Q^dt d of fixed de- 
gree D — 1 with matrix valued coefficients Aj G R MxA ' for d = 0, ... ,D — 1. Very close to our work is Eric 
Phipps' PhD thesis 0. Phipps used the combined forward and reverse mode of AD for the linear algebra 
routines Ax = b,A~ l , C = AB, C = A o B in the context of a Taylor Series integrator for differential algebraic 
equations. A paper by Vetter in 1973 is also treating matrix differentials and the combination with ma- 
trix Taylor expansions (H. However, the focus on the paper is on results of matrix derivatives of the form 
D B A(B) G R m aM b xN a n b ^ where A e ^m a ,n a ^ b e ^m b ,n b anc i tne derivative computation is not put into the 

context of a computational procedure. 

An alternative to the Taylor propagation approach is to transform the computational graph to obtain a compu- 
tational procedure that computes derivatives. Higher order derivatives are computed by successively applying 
these transformations. There is a comprehensive and concise reference for first order matrix derivatives in the 
forward and reverse mode of AD by Giles flU. More sophisticated differentiated linear algebra algorithms are 
collected in an extended preprint version Q. The eigenvalue decomposition algorithm derived by Giles is a 
special case of the algorithm presented in this paper. 



3 Mathematical Description 
3.1 Notation for Composite Functions 

Typically, in the framework of AD one considers functions F : M. N — > IR M , x ^ y = F(x) that are built of 
elementary functions <j>. where x = (jci , . . y = (ji, . . . ,jm) with x m y m £ R for 1 < n < N, 1 < m < M. 

Instead, we look at functions 

N M 

F:0K„ -> 0K m (1) 

73=1 m=\ 

(xi,...,x N ) h> =F(xi,...,x N ) , (2) 

where K is some ring. Here K = (M MxA ', +, ■) where +, ■ the usual matrix matrix addition and multiplication.. 
If F maps to IR lxl we use the symbol / instead of F. For example f(x,y) = tr (xy + x) where x E M. N ' N ,y G 
^n,n can k £ wr jtten as 

f(x,y) = (j>4(<h(<l>i(x,y),(h(x))) = 04(03(V1,V 2 )) = 04(v 3 ). 

We use the notation v/ for the result of 0/ and Vj^i for all arguments of 0/. To be consistent the independent 
input arguments v„ are also written as v n -N = x n . To sum it up, the following three equations describe the 
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3.2 The Push Forward 



3 MATHEMATICAL DESCRIPTION 



function evaluation: 



V n -N = x n n=l,...,N 
vi = §i{v H i) l=l,...,L 

yM-m = V L -m M = M- 1,...,0, 



(3) 
(4) 
(5) 



where L is the number of calls to basics functions 0/ during the computation of F . 



3.2 The Push Forward 



We want to lift the computational procedure to work on the polynomial factor ring K[?]/? D K[?] with repre- 
sentatives [x]d '■ = T%=QXdt d - We define the push forward of a sufficiently smooth function in the following 
way: 



D-i , d d D -1 
?(/)([*]*) := E^-jp?/(E*0 

d=0 U • U ' c=0 



(6) 



f=0 



For a function v = /(x) we use the notation [y]o = "^(/)([jc]d). The definition of the push forward induces 
the usual addition and multiplication of ring elements 



f(mu\)([x] D ,\y] D ) = E^^CE^CE^ 

d=o a - aT c=0 



f( a dd)([x] D ,\y] D ) 



D—l 



k=0 
D-\ 



t=0 



d=0 M • u * c=0 /b=0 



;=0 



(7) 

(8) 



We now look at the properties of this definition: 

Proposition 1. For f : R K ->■ M M and g : — >■ sufficiently smooth functions we have 



~P*(fo g ) = ?(f)o?( g ). 



(9) 
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3 MATHEMATICAL DESCRIPTION 



3.3 ThePullback 



Proof. Let [x] D £ R N [t]/t D R N [t], then 



?(fog)([x] D ) = 



D-l 



D-l i d d 

E^ (/og)( E^ c ) 



rf=0 



c=0 
D-l 



r=0 



rf=0 " • u ' c=0 



f=0 



D-l i H d D-l 



d=0 

D-l ! d d 



6=0 

D-l 



f=0 



= (^(/)»^(s))(Wo). 



r=0 



□ 



This proposition is of central importance because it allows us to differentiate any composite function F by 
providing implementations for the push forwards P((j>) of a fixed set of elementary functions 0. 



3.3 The Pullback 



The differential df of a function / : R w ->■ R M ,x H> v = f(x) is a linear map between the tangent spaces, i.e. 
df(x) : T X R N H> r y M M . An element of the cotangent bundle T*R M can be written as 



M 



m=l 



i.e. a(y,y) maps any element of T y R M to 



M 



M 



M N -\rm 



m=l m=l 
N M ;v,m N 



m=l n=l 



T ^.. dy" 

= E E y™-^ 6 ^ = £ x n dx n = a(x,x) . 

n=lm=l dX n=l 



tr. 



(10) 



(11) 

(12) 



A pullback of a composite function fog is defined as P (/) := fog. To keep the notation simple we often 
use /=£(/)■ 
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3.4 The Pullback of Lifted Functions 



3 MATHEMATICAL DESCRIPTION 



3.4 The Pullback of Lifted Functions 

We want to lift the function a. Due to Proposition |3] we are allowed to decompose the global pullback to a 
sequence of pullbacks of elementary functions. 
Lemma 2. 

d(?(f)) = ~?(df) (13) 

Proof. Follows from the fact that the differential operators ^ and d interchange. □ 

Proposition 3. Let v/ = <j>i(vj^i) and (j>i some elementary function, vj^i are all arguments Vj of (j>i (c.f ^) 
. Then we have 

N) = L^(«)([vy],N). (14) 

Hi 

Proof. 

^(«)(N,N) = ntfiytivw])) 

= [l?/]D^(dVi)([vy^]) 

= E[v/]D^(|^)(W)^(d[vy]) 

Hi ° Vj 
= L[V;]d[vy] 

= L^(«)(N,N) 

□ 

These propositions tell us that we can use the reverse mode of AD on lifted functions by first computing a push 
forward and then go reverse step by step where algorithms for d0 must be provided that work on R[f]/fR[f]. 
It is necessary to store P (g^)([v/-</]) or [v/_</] during the forward evaluation. From the sum Y,j-<1 one can 
see that the pullback of a function 0/ is local, i.e. does not require information of any other Vj. In the context 
of a computational procedure one obtains 

[v;] = L[v^(^)(W). (15) 
Hk av i 
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3 MATHEMATICAL DESCRIPTION 



3.5 Univariate Taylor Propagation of Matrices 



3.5 Univariate Taylor Propagation of Matrices 

Now that we have introduced the AD machinery, we look at the two possibilities to differentiate linear algebra 
(LA) functions. One can regard LA functions as algorithms. Formally, this approach can be written as 



ft 



li 



[Yim y 



7(F) 



\\[Xn x i] ■■■ [ X N X M X ]\J 



\ 



(16) 



I.e. the function F is given a matrix with elements [X nm ] 6 R[f]/fR[f], A simple reformulation transforms 



such a matrix into a polynomial factor ring over matrices K 


[t]/tK 


t],K = ] 






VD vllfd v \M f d~ 

Ld=o A d r ••■ Ld=o A d 1 




D 


A d ■ 


v\M 
■ A d 




X^D v Nl t d \-D v NM f d 

Ld=o A d 1 ••• Ld=0 A d 1 . 




E 

d=0 


Y N1 
.d 


vNM 
■ A d 



(17) 



We denote from now on matrix polynomials as the rhs of Eqn. (TT71) as [X] . The formal procedure then reads 

[Y]=?(F)([X]), (18) 
where ~P*(F) must be provided as an algorithm on K. 



3.6 Pullback of Matrix Valued Functions 



Applying the reverse mode to a function /:R M ^l,X^y = f(x) yields 



df 



ydfiX) = J> 

n,m UA nm 



dX„ 



tr 



r d f 


df 1 




dX u ■ 






df 


df 






SXmn . 





dX\ i ... 6X\m 
AXn\ . . . 6Xnm 



\ 

tr (X T dX) 



<M 



Some well-known results (3l|2l| of the reverse mode for unary functions Y = F(X) are 



Y =X 1 
Y =X T 
y = tx(X) 



tr (Y T dY) =tr (-YY T YdX) 
tr (Y T dY) =tr (YdX) 
jJdtr (X) =tr (yldX) 



(19) 



(20) 

(21) 

(22) 
(23) 
(24) 
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4 PRELIMINARIES FOR THE RECTANGULAR QR AND EIGENVALUE DECOMPOSITION 



For binary functions Z = F(X, Y) one obtains tr (Z r dZ) = tr (X T dX) + tr (Y T dY). E.g for the matrix matrix 
muliplication one has 

Z = XY: tr (Z r dZ) =tr (YZ T dX + Z T XdY) . (25) 



4 Preliminaries for the Rectangular QR and Eigenvalue Decomposi- 
tion 

In this section we establish the notation and derive some basic lemmas that are used in the derivation of 
the push forward of the rectangular QR and eigenvalue decomposition of symmetric matrices with distinct 
eigenvalues. Both algorithms have as output special matrices, i.e. the upper tridiagonal matrix R and the 
diagonal matrix A. We write the algorithms in implicit form for general M. MxN matrices and enforce their 
structure by additional equations. E.g. an upper tridiagonal matrix R is an element of R NxN satisfying 
Pl°R — 0, where the matrix Pi is defined by (Pijij = (i > j), i- e - a strictly lower tridiagonal matrix with 
all ones below the diagonal. The binary operator o is the Hadamard product of matrices, i.e. element wise 

multiplication. We define YJd=o x dt D = Y^d=oydt D iff x d = yd f° r d = 0, . . . , D — 1 . 
Lemma 4. Let A e R NxN and P L resp. Pr defined as above. Then 



(PloA) 1 = P R oA T . (26) 



Proof. 



Bij := (P L oA) ij =A ij (i>j) 

Bjj = B ji =Aj l (j>i)=Af j P R = P R oA 

□ 

Lemma 5. Let X e M. NxN be an antisymmetric matrix, i.e. X T = —X and Pi defined as above. We then can 
write 

X = P L oX-(P L oX) T . (27) 

Proof. 

X = P L oX+P R oX = P L oX + (P L oX T ) T = P L oX-(P L oX) T 

□ 

Lemma 6. LetA,B,Ce R MxN . We then have 

tr (A T (BoC)) = tr (C T (BoA)) (28) 
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5 RECTANGULAR QR DECOMPOSITION 



Proof. 

N M 

tr(A T (BoC)) = J £ J L A nmBnmC m n = tr(C T (BoA)) 

n=l m=l 

Lemma 7. Let X £ M. NxN be antisymmetric, i.e. X T = —X and D £ Mr be a diagonal matrix. Then we 
have 

= Io (DX-XD). (29) 

Proof. Define A := XD and B := DX, then the elements of A and B are given by = £,Z^vD jkSjk — ^ij^kk 
resp. Bft = J^jDijdijXji^ = D^X^. Therefore the diagonal elements are Akk = Bkk = ^kk^kk- d 



5 Rectangular <2i? decomposition 

We derive algorithms for the push forward (Algorithm^ and pullback (Algorithmic]) of the QR decomposition 
Q : R = qr(A), where A, Q £ i? MxiV and 7? £ for M >N. 

Algorithm 8. Push forward of the Rectangular QR decomposition: 

A d £ R MxN , Q d £ R MxN and R d £ M A ' xA ', d = 0, ... ,D - 1. We assume M >N, i.e. A has more rows than 
columns. 

• given: [A] 0+E , 1 < E < D 

• compute [Q] D+E = [Q] D + [AQ] E t D , [R] D+E = [R\d + [AR] E t D 

known wanted known wanted 

~ StePl: [AF] E t D °= [Q]d[R]d-[A] d (30) 

[AG] E t D °± E 1-[Q T ] D [Q] D (31) 

- Step 2: F 

[H} E = [AA] E — [AF]e (32) 

[S] E = ~[AG] E (33) 



- Step 3: 



Pl°([X} e ) = Pl°([Q T }e[H} e [R- 1 }e) -Pl°[S}e (34) 

(35) 



- Step 4: F 

[K] E = [S]e + [X]e (36) 
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5 RECTANGULAR QR DECOMPOSITION 



- Step 5: 



Me = [Q] T e [H} e -[K}e[R}e 



(37) 



St6p6: = ([H]e-[Q]e[AR]e)[R] e 1 

Proof. The starting point is the implicit system 

D+E 



"=' [A} d+e -[Q]d+e[R]d+e 
[Q T ]d+e[Q]d+e-1 

D+E 



°- m 

P L o[R] D+E 



1. To derive the algorithm, this system of equations has to be solved: 



°± E [AA} E t D + [A} D -[Q] D [R} D -([Q} D [AR] E + [AQ] E [R] D )t D 



-:-[AF] E t D 



= [Q T }D[Q}D-l+([AQ T }E[Q} D +[Q T } D [AQl E )t D 

=:-{AG] Et D 
D ± E P L o[R] D+E 



(38) 



(39) 
(40) 
(41) 



2. Any matrix can be decomposed into a symmetric and an antisymmetric part: [Q t ]e [AQ]e = [X]e + [S]e 
with [Z]| = -[X] E and = [S} E . It then follows from Eqn. © that = -[AG] £ + 2[5] £ , i.e. we 
have [S]e = \{AG\ e . 

3. Insert the above relation into Eqn. (|39l ) and use Eqn. (I4TT) to obtain 



P L o[X} E = P L o 



uniquely defining [X] E . Therefore, we have 



[Q T ] E ([AA] E -[AF} E )[R 



-l 



V 



■[H]e 



[K]e := [2 r ] £ [A<2] £ = [5] £ + [X] £ . 

4. to obtain [AR] E we transform |39l and obtain 

Me = [Q T ]e[H]e-[K}e{R}e. 

5. finally, transform Eqn. (|39l) by right multiplication of [^j^ 1 : 

[AQ}e = ([H}e-[Q}eMe)[R\e 1 
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5 RECTANGULAR QR DECOMPOSITION 



□ 

The pullback transforms a function, here the QR decomposition to a function that computes the adjoint func- 
tion evaluation. Explicitely ~f (a) ( [A] , [A] ) = ~t (a) ( [Q] , [Q] ) + ~t (a) ( [R] , [R] ) . 

Algorithm 9. The pullback can be written in a single equation 

A = A + Q(R + P l o(RR t -RR t + Q t Q-Q t Q)R t ) + (Q-QQ t Q)R t . (42) 

For square A, i.e. A e R NxN the last term drops out. To lift the pullback one can use the same formula and use 
X = [X} D forX eitherA,<2,fl,A,<2or& 

□ 

Proof. We differentiate the implicit system 

= A-QR 

= Q T Q-l 
= P L oR 

and obtain 

= dA-dQR-QdR (*) 
= dQ T Q + Q T dQ (**). 

We define the antisymmetric "matrix" X := Q T dQ. Transforming equation (*) as Q T (*)R~ l yields 

= Q T dAR l -Q T QdRR~ 1 -Q T dQ 
therefore P L oX = P L o(Q T dAR^ 1 ) 
and thus X = P L oX - (P L oX) T . 

Left multiplication Q T (*) yields dR = Q T dA-XR and transformation (*)R^ yields dQ = (dA - gd/?)/? -1 . 
We are now in place to calculate 

K(Q T dQ)+tr(R T dR) = tr (Q T (dA - QdR)R 1 ) +tr (^ r d7?) 

= tr ( J R" 1 2 r dA)+tr ((R T -R l Q T Q)dR) 

11 v ' 

=:F 

= tr (R- 1 Q T dA)+tr(F(Q T dA-XR)) 

= \x{{R- l Q T +FQ T )dA)+tr(-RFX) 

= tr ((R- l Q T +FQ T )dA)+tr (-RF(P L oX - (P L oX) T )) 

= tr ((R- l Q T +FQ T )dA)+tr (-RF(P L o Q T dAR 1 - (P L oQ T dAR- 1 ) T )) 

= tr ((QR T + QF T )dA T ) + tr (R~ T dA T Q(P L o (RF - F T R T )) 

= tr ((QF T + QR T + QP L (RF -F T R T )R T )dA T ) 

therefore A = A + Q(R + P l o (Q t Q-Q t Q + RR t -RR t )R t ) + (Q-QQ T Q)R T . 
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6 EIGENVALUE DECOMPOSITION OF SYMMETRIC MATRICES WITH DISTINCT 

EIGENVALUES 



In the above derivation we have used Lemmas [51 @] and [6l 

□ 



6 Eigenvalue Decomposition of Symmetric Matrices with Distinct Eigen- 
values 

We compute for A £ R NxN a symmetric matrix with distinct eigenvalues the eigenvalue decomposition AQ = 

QA, where A £ R. NxN is a diagonal matrix and Q £ M. NxN an orthonormal matrix. 

Algorithm 10. Push Forward of the Symmetric Eigenvalue Decomposition with Distinct Eigenvalues: 

We assume [Q]d+e = [Q]d+ [AQ]Et D and [A]d + £ = [A] D -f- [AA]Et D , where [<2]d and [A]d have already been 

computed. I.e., it is the goal to compute the next£ coefficients [AQ]e and [AA]#. 



Step 1: 



Step 2: 

Step 3: 
Step 4: 
Step 5: 
Step 6: 



[AF] E t D u ¥ [Q T ] D [A] D [Q] D -[A] D (43) 

[AG] E t D [Q T ] D [Q} D -1 (44) 



[S} E = - l -[AG] E (45) 



[K] E = [AF] £ + [e y ] £ [AA] £ [2] £ + [5] £ [A] £ + [A] £ [5] £ (46) 
[AA] £ = lo [K] E (47) 
[Hij} E = {[We-W 1 if i±h else (48) 



[AQ}e = [Q}e([H} e o([K}e-[AA}e) + [S)e) (49) 
Proof. We solve the implicit system 

°= [Q T }d + e[A] d+e [Q]d + e-[A] d+e 

°= [Q T ]d + e[Q]d+e-1 

°= (Pl+Pr)o[A] d+e . 
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6 EIGENVALUE DECOMPOSITION OF S YMMETRIC MATRICES WITH DISTINCT 
EIGENVALUES 



We split [A]/)+£ = [A]d + [AA]£t D in the known part [A]d and unknown part [AA]^. For ease of notation we 
use A = [A]d and AA = [AA]e in the following. The first equation can be transformed as follows: 

°= [Q T ]d + e[A] d+ e[Q]d+e-[A] d+e 

(Q T + AQ T t D ) (A + AAt D ) (Q + AQt D ) - (A + AAt D ) 
°= Q T AQ - A + (Q T AAQ + Q T AAQ + AQ T AQ - AA) t D 

=:AFt D 

From the second equation we get 

AG = Q T AQ + AQ T Q, 



which can be written as 



S+X = AQ T Q 



where S is symmetric and X is antisymmetric. 
Using AQ — QA we get 

= AF -AA + Q 1 AAQ + AQ 1 AQ + AQ 1 QA 
AA = AF + Q T AAQ + A(S-X) + (S + X)A 

Using the structural information that A is a diagonal matrix we obtain 

AA = lo (AF + Q T AAQ + A(S-X) + (S+X)A) 
= lo(AF + Q T AAQ + AS + SA) , 



=:K 



since I o (XA — AX) = from Lemma|7J Now that we have AA we can compute AQ: 



= K-AA + XA-AX 
= K-AA + EoX 



where Etj = Ajj — An. Define Htj = (Ajj — A,-,-) 1 for i ^ j and else: 

X T = Ho(K-AA) . 



Using X T = Q T AQ - S we obtain 
This concludes the proof. □ 



A£ = Q(Ho(K-AA)+S) . 
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6 EIGENVALUE DECOMPOSITION OF SYMMETRIC MATRICES WITH DISTINCT 

EIGENVALUES 



We now derive the pullback formulas for the eigenvalue decomposition. The pullback acts as P : (A h-> 

e,A)^((e,A,e,A,A)^A) 

Algorithm 11. Pullback of the Symmetric Eigenvalue Decomposition with Distinct Eigenvalues: 
Given A, <2, A, <2, A, compute A 

[Hijh = (Md-Md) -1 if else (50) 

[A]d = [Q]d([A]d + [H] d o([Q t ] d [Q] d ))[Q t ] d (51) 

iVoo/ We want to compute tr (A T dA) = tr (A r dA) + tr (Q T dQ) . We differentiate the implicit system 

= Q T AQ-A 
= Q T Q-l 



= (P l + Pr)oA 



and obtain 



dA = dQ T AQ + Q T dAQ + Q T AdQ 
= dQ T Q + Q T dQ. 

A straight forward calculation shows: 

tr(A T dA) = tr (AQA T dQ T ) +tr (AQ T AdQ) +tr (QAQ T dA) 

= tr (Q T AQA T dQ T Q) + tr (AQ T AQQ T dQ) + tr (QAQ T dA) 

= tr (AAdQ T Q) + tr (AAQ T dQ) + tr (QAQ T dA) 

= tr(QAQ T dA), 

tr(Q T dQ) = (Q T QQ T dQ) 

= tr(Q T Q(Ho(Q T dAQ))) 

= tr(Q T dA T Q(Ho(Q T Q))) 

= tr(Q(H T o(Q T Q))Q T dA), 

tr(A r dA) = tr {(Q(A + H T o (Q T Q))Q T )dA) 

where we have used 

= AQ-QA 
=^0 = dA<2 +AdQ - dQA - QdA 

= dAQ + QQ T AQQ T dQ-QQ T dQA-QdA 
= dAQ-Q(Ko(Q T dQ))-QdA 
Q T dQ = Ho(Q T dAQ-dA) 
= Ho(Q T dAQ) 

where we have defined K t j := Ajj — An and H t j = (Kij)~ l for i ^ j and Hij = otherwise and used the 
property AX - XA = K o X with Ky = Ajj - A u for all X eR NxN and diagonal AeR NxN . □ 
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7 Software Implementation 

The algorithms presented here are implemented in the Python AD tool ALGOPY O. It is BSD licensed and can 
be publicly accessed at www . github . com/b45chl/algopy. The goal is to provide code that is useful not only 
for end users but also serving as repository of tested algorithms that can be easily ported to other programming 
languages or incorporated in other AD tools. The focus is on the implementation of polynomial factor rings 
and not so much on the efficient implementation of the global derivative accumulation. The global derivative 
accumulation is implemented by use of a code tracer like ADOL-C but stores the computational procedure 
not on a sequential tape but in computational graph where each function node also knows it's parents, i.e. the 
directed acyclic graph is stored in a doubly linked list. 

At the moment, prototypes for univariate Taylor propagation of scalars, cross Taylor propagation of scalars 
and univariate Taylor propagation of matrices are implemented. It is still in a pre-alpha stage and the API is 
very likely to change. 



8 Preliminary Runtime Comparison 

This section gives a rough overview of the runtime behavior of the algorithms derived in this paper com- 
pared to the alternative approach of differentiating the linear algebra routines. We have implemented a QR 
decomposition algorithm that can be traced with PYADOLC. The results are shown and interpreted in Figure 
\T\ In another test we measure the ratio between the push forward runtime and the normal function evaluation 
runtime: 

TIME(push forward) /TIME(normal) w 1 1.79 

for the QR decomposition for A G R 100x5 up to degree D — 4 and five parallel evaluations at once. For the 
eigenvalue decomposition we obtain 

TEVIE(push forward)/TIME(normal) « 11.88 

for A G R 20x20 , D = 4 and five parallel evaluations. This test is part of ALGOPY [|9]|. 

9 Example Program: Gradient Evaluation of an Optimum Experi- 
mental Design objective function 

The purpose of this section is to show a motivating example from optimum experimental design where these 
algorithms are necessary to compute the gradient of the objective function 4> in a numerically stable way. The 
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Figure 1: In the left plot the comparison between the push forward of PYADOLC and ALGOPY for D = 4. In 
the right plot the lifted pullback for D = 4. On the x-axis we plot the size N of the test matrices A 6 M A ' xA ' 
and on the j-axis the runtime ratio. Unfortunately, there are significant fluctuation in the relative runtime 
measurements. Therefore, we have repeated each test 10 times and plotted mean and standard deviation. 
Nonetheless it is obvious that for large matrices the UTPM implementation in ALGOPY clearly outperforms the 
UTPS differentiated algorithm using PYADOLC. We must stress that the plots only indicate the actual runtime 
ratio that would be obtained by efficient FORTRAN/C/C++ implementation of both methods. There are also 
many possibilities to improve the performance of PYADOLC, e.g. by adjusting the buffer sizes of ADOL-C or 
by using direct LAPACK calls instead of the standard numpy . dot function in ALGOPY. 
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algorithmic procedure to compute <J> is given as a straight-line program 



F 



F{x,y) 



Q,R 



J 



dF 
dy~ 

qr(7) 



A,U 
4> 



D 



C 



solve (R, I) 
DD T 
eig (C) 
An 



where F E IR^" 1 , 7 G R, 7 G M. NmXN y. The gi? decomposition is used for numerical stability reasons since 
otherwise the multiplication of J T J would square the condition number. 

To be able to check the correctness of the computed gradient we use a simple F(x,y) that allows us to de- 
rive an analytical solution by symbolic differentiation. We use F(x,y) = Bxy where B E M. N,nXNx is a ran- 
domly initialized matrix, x E M. Nx and yGK. Thus, the objective function is <&(y) = y~ 2 X\((B T B)~ l and thus 
V y <t>(y) = — 2y~ 3 X\((B T B)~ l . We use N x = 11. A typical test run where the symbolical solution and the AD 
solution are compared yields 



This example is part of ALGOPY flU. 
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